Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web

نویسندگان

  • Sanjiv R. Das
  • Mike Y. Chen
چکیده

We develop a methodology for extracting small investor sentiment from stock message boards. Five distinct classifier algorithms coupled by a voting scheme are found to perform well against human and statistical benchmarks. Time series and cross-sectional aggregation of message information improves the quality of the resultant sentiment index. Empirical applications evidence a relationship with stock returns – visually, using phase-lag analysis, pattern recognition and statistical methods. Sentiment has an idiosyncratic component, and aggregation of sentiment across stocks tracks index returns more strongly than with individual stocks. Preliminary evidence suggests that market activity influences small investor sentiment. Thus, the algorithms developed in this paper may be used to assess the impact on investor opinion of management announcements, press releases, third-party news, and regulatory changes. ∗We owe a special debt to the creative environments at UC Berkeley’s Computer Science Division and Haas School, where this work was begun. Thanks to David Levine for many comments and for the title. We are grateful to Vikas Agarwal, Chris Brooks, Yuk-Shee Chan, David Gibson, Geoffrey Friesen, David Leinweber, Asis Martinez-Jerez, Priya Raghubir, Sridhar Rajagopalan, Ajit Ranade, Mark Rubinstein, Peter Tufano, Raman Uppal, Shiv Vaithyanathan, Robert Wilensky and seminar participants at Northwestern University, UC Berkeley-EECS, London Business School, University of Wisconsin, Madison, the Multinational Finance Conference, Italy, the Asia Pacific Finance Association Meetings, Bangkok, and the European Finance Association Meetings, Barcelona, for helpful discussions and insights. Danny Tom and Jason Waddle were instrumental in delivering insights into this paper through joint work on alternative techniques via support vector machines. The first author gratefully acknowledges support from the Price Waterhouse Cooper’s Risk Institute, the Dean Witter Foundation, and a Research Grant from Santa Clara University. Please address all correspondence to Professor Sanjiv Das, Breetwor Fellow & Associate Professor, Santa Clara University, Leavey School of Business, Dept of Finance, 208 Kenna Hall, Santa Clara, CA 95053-0388. Email: [email protected]. Mike Chen is in the Computer Science Division, UC Berkeley, [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Knowledge Bases from the Web

The web is a vast repository of human knowledge. Extracting structured data from web pages can enable applications like comparison shopping, and lead to improved ranking and rendering of search results. In this talk, I will describe two efforts to extract records from pages at web scale. The first is a wrapper induction system that handles end-to-end extraction tasks from clustering web pages t...

متن کامل

2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework

Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...

متن کامل

Building Knowledge-bases from the Web

The web is a vast repository of information. Most of the information on the web is meant for human consumption. Extracting structured information from the web can enable several applications like advanced ranking, semantic search, etc. In this talk, we first list different types of content available on the web, survey known techniques for extracting information from them, present the architectu...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Sentiment strength detection for the social web

Mike Thelwall, Kevan Buckley, Georgios Paltoglou Statistical Cybermetrics Research Group, School of Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK. E-mail: [email protected], [email protected], [email protected] Tel: +44 1902 321470 Fax: +44 1902 321478 Sentiment analysis is concerned with the automatic extraction of sentiment-related information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Management Science

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2007